Siglip So400m 14 980 Flash Attn2 Navit
Apache-2.0
SigLIP-based vision model that enhances maximum resolution to 980x980 through interpolated positional embeddings and implements NaViT strategy for variable resolution and aspect ratio-preserving image processing
Text-to-Image
Transformers